Max Pellert (https://mpellert.at)
Deep Learning for the Social Sciences
1D regression model is obviously limited
We want to be able to describe input/output relationships that are not lines
We want multiple inputs
We want multiple outputs
Shallow neural networks
Flexible enough to describe arbitrarily complex input/output mappings
Can have as many inputs as we want
Can have as many outputs as we want
1D Linear Regression
Example shallow network
This model has 10 parameters:
Represents a family of functions
Parameters determine particular function
Given parameters can perform inference (run equation)
Given training dataset:
Define loss function (least squares)
Change parameters to minimize loss function
Piecewise linear functions with three joints
Break down into two parts:
where:
Example shallow network = piecewise linear functions
1 “joint” per ReLU function
Which hidden units are activated in the shaded region?
Shaded region:
Unit 1 active
Unit 2 inactive
Unit 3 active
Each parameter multiplies its source and adds to its target
With three hidden units, like before:
With D hidden units:
…we can describe any 1D function to arbitrary accuracy
“A formal proof that, with enough hidden units, a shallow neural network can describe any continuous function on a compact subset of ℝᴰ to arbitrary precision”
1 input, 4 hidden units, 2 outputs
2 inputs, 3 hidden units, 1 output
Outputs, D hidden units, and inputs
e.g. three inputs, three hidden units, two outputs
Y-offsets = biases
Slopes = weights
Everything in one layer connected to everything in the next = fully connected network
No loops = feedforward network
Values after ReLU (activation functions) = activations
Values before ReLU = pre-activations
One hidden layer = shallow neural network
More than one hidden layer = deep neural network
Number of hidden units = network capacity